Add opt-in resumable uploads to the file upload URL endpoint#5995
Add opt-in resumable uploads to the file upload URL endpoint#5995rtibbles wants to merge 2 commits into
Conversation
- supports_resumable flag on the GCS storage backends - get_stored_object_md5: dedup lookup against an object's GCS-computed md5 - create_resumable_upload_session: pins md5 + declared-size metadata - hex_to_base64 checksum helper Part of learningequality#5975. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UzUP3UYP4cLyouvssXyekj
- accept a `resumable` flag (defaults off) - GCS: skip when the stored md5 matches the checksum, else return a server-initiated resumable session URI - non-GCS backends fall back to single-PUT - reject non-resumable uploads over 500 MB - make `size` an IntegerField, dropping the redundant float casts Part of learningequality#5975. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UzUP3UYP4cLyouvssXyekj
bjester
left a comment
There was a problem hiding this comment.
I feel this creates some technical debt due to it propagating the poor existing patterns in the storage code. Aligning the functionality with the GCS specific storage classes would make the code cleaner.
| def create_resumable_upload_session(filepath, md5_b64, size, storage=default_storage): | ||
| client = storage.get_client() | ||
| blob = client.get_bucket(settings.AWS_S3_BUCKET_NAME).blob(filepath) | ||
| blob.md5_hash = md5_b64.strip() | ||
| blob.content_type = determine_content_type(filepath) | ||
| blob.metadata = {"declared-size": str(size)} | ||
| return blob.create_resumable_upload_session(client=client) |
There was a problem hiding this comment.
This continues the poor pattern started by get_presigned_upload_url. In a yet unmerged PR, I started work to move that to the client classes, because its implementation is entirely client specific.
The logic in this function is entirely specific to the GCS client. While there is the supports_resumable gate, this function gives no indication of that, and even defaults storage to default_storage, which could be incompatible. This makes the code unclear without knowledge of the API shapes between GCS and local emulators.
My suggestion would be to move in the same direction as the code I shared. Unfortunately, it appears I also introduced a shared base class there, so ideally you do the same so it's defined on the Storage API interface, and have the local/minio one throw an error.
Summary
Add opt-in resumable upload to the file upload URL endpoint.
Do this to allow resumable uploads and upload of much larger files.
References
Fixes #5975
Reviewer guidance
sizevalidation changed from float to integer (FileUploadURLSerializer) — fractional sizes now 400. In the frontend of Studio we send the file size in bytes as reported by theinputelement.MAX_NON_RESUMABLE_UPLOAD_SIZE) are now rejected and must opt intoresumable- no impact on client side uploads in Studio.mimetype/might_skipand returnsalreadyUploadeddeclared-sizemetadata only, enforced by a separate post-completion hook.AI usage
Used Claude Code to implement the endpoint and GCS storage helpers, following the existing presigned-URL flow. Reviewed the diff in a local self-review pass — tightened the test suite to assert behavior rather than DRF/mock internals, and named the md5 helper so its base64 return is clear at the call site. Verified with the backend test suite.